An example of forecasting time series data with multiple seasonal patterns and trend with linear regression. Using the demand for electricity in the UK dataset from the UKgrid package
library(TSstudio)
library(UKgrid)
library(magrittr)
library(lubridate)
library(forecast)
df1 <- extract_grid(type = "data.frame", # Set the output format
columns = "ND", # Select the Net Demand for elctricity column
aggregate = "daily") # Set the agrregation level
head(df1)
## TIMESTAMP ND
## 1 2011-01-01 1671744
## 2 2011-01-02 1760123
## 3 2011-01-03 1878748
## 4 2011-01-04 2076052
## 5 2011-01-05 2103866
## 6 2011-01-06 2135202
ts.obj <- ts(df1$ND, start = c(2011, 1), frequency = 365)
ts_plot(ts.obj = ts.obj,
title = "The Demand for Electricity in the UK",
Ytitle = "MW")
Different plots for seasonality analysis
ts_seasonal(ts.obj = ts.obj)
ts_heatmap(ts.obj = df1)
ts_decompose(ts.obj = ts.obj)
ts_quantile(df1, period = "weekdays")
ts_quantile(df1, period = "monthly", n = 2)
There is a clear evidence for multiple seasonality patterns in the series:
df1$index <- 1:nrow(df1)
df1$wday <- wday(df1$TIMESTAMP, label = TRUE) %>% factor(ordered = FALSE)
df1$month <- month(df1$TIMESTAMP, label = TRUE) %>% factor(ordered = FALSE)
head(df1)
## TIMESTAMP ND index wday month
## 1 2011-01-01 1671744 1 Sat Jan
## 2 2011-01-02 1760123 2 Sun Jan
## 3 2011-01-03 1878748 3 Mon Jan
## 4 2011-01-04 2076052 4 Tue Jan
## 5 2011-01-05 2103866 5 Wed Jan
## 6 2011-01-06 2135202 6 Thu Jan
h = 365 # Set the forecast horizon
Create a data frame with the input values of the actual forecast, must have the same features we are using to train the model. In this case all the inputs are deterministic (day of the week, month, etc.), if using non-deterministic (such as temperature, prices, etc.) inputs, you will have to forecast their future values
# Create a data frame with futures dates
# using the last date of the actual data + one day
fc_df <- data.frame(date = seq.Date(from = max(df1$TIMESTAMP) + days(1), by = "days", length.out = h))
# Continue the index input
fc_df$index <- seq((max(df1$index) + 1), by = 1, length.out = h)
# Day of the week variable
fc_df$wday <- wday(fc_df$date, label = TRUE) %>% factor(ordered = FALSE)
# Month of the week variable
fc_df$month <- month(fc_df$date, label = TRUE) %>% factor(ordered = FALSE)
# Comparing between the end of the actual data frame and the beginning of the forecast data frame
tail(df1)
## TIMESTAMP ND index wday month
## 2900 2018-12-09 1454210 2900 Sun Dec
## 2901 2018-12-10 1756626 2901 Mon Dec
## 2902 2018-12-11 1777229 2902 Tue Dec
## 2903 2018-12-12 1712503 2903 Wed Dec
## 2904 2018-12-13 1731810 2904 Thu Dec
## 2905 2018-12-14 1810382 2905 Fri Dec
head(fc_df)
## date index wday month
## 1 2018-12-15 2906 Sat Dec
## 2 2018-12-16 2907 Sun Dec
## 3 2018-12-17 2908 Mon Dec
## 4 2018-12-18 2909 Tue Dec
## 5 2018-12-19 2910 Wed Dec
## 6 2018-12-20 2911 Thu Dec
Creating a training and testing partitions
ts object (ts.obj) to training and testing partitionsdata.frame to training and training partitions (must aligned with the ts.obj order)ts_par <- ts_split(ts.obj = ts.obj, sample.out = h)
train <- ts_par$train
test <- ts_par$test
train_df <- df1[1:length(train),]
test_df <- df1[(length(train) + 1):nrow(df1),]
nrow(train_df) == length(train)
## [1] TRUE
nrow(test_df) == length(test)
## [1] TRUE
Using the built-in functionality of the tslm function using trend and seasonal component
Note that by default, the tslm function define the season by the frequency of the series, which in this case is the day of the year (create categorical variable with 365 levels)
md1 <- tslm(train ~ trend + season)
fc1 <- forecast(md1, h = h)
test_forecast(actual = ts.obj,
forecast.obj = fc1,
test = test)
accuracy(fc1, test)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.593520e-13 115565.9 95703.52 -0.5631289 6.238168 0.8013792
## Test set 3.026682e+04 117086.9 95400.12 1.4226202 6.579675 0.7988387
## ACF1 Theil's U
## Training set 0.5148323 NA
## Test set 0.5290950 0.9851808
With MAPE Error rate of 6.23% and 6.57% on the training and testing partitions, respectively
md2 <- tslm(train ~ trend + season + wday , data = train_df)
fc2 <- forecast(md2, h = h, newdata = test_df)
test_forecast(actual = ts.obj,
forecast.obj = fc2,
test = test)
accuracy(fc2, test)
## ME RMSE MAE MPE MAPE MASE
## Training set -2.769075e-12 64590.90 47940.93 -0.1616403 3.001469 0.4014363
## Test set 3.048622e+04 89925.85 70838.43 1.8454164 4.767082 0.5931699
## ACF1 Theil's U
## Training set 0.7422595 NA
## Test set 0.6524709 0.7364615
Doing a better job with 3.00% and 4.76% error rate on the training and testing partitions, respectively. We will use this approach to create the final forecast
Retrain the model with the data and forecast the following 365 days
final_md <- tslm(ts.obj ~ trend + season + wday, data = df1)
final_fc <- forecast(final_md, h = h, newdata = fc_df)
plot_forecast(final_fc)